Goals for today:

  • Understand why linear models do not work well with some types of data (e.g., binary data)
  • Fit generalized linear models (GLMs), in particular binary binomial (a.k.a., logistic regression)
  • Visualize binomial GLMs

Why start with binary GLMs? Because these are the simplest of all GLMs. Not too much can go wrong with them. In future sessions we will work with other types of GLMs for which we will need to learn some more concepts and be more careful with assumptions.

Today we will need the following packages:

library(tidyverse) #including ggplot2
library(ggbeeswarm)
library(ggResidpanel)

library(glmmTMB)
library(DHARMa)
library(emmeans)

In addition, if you want to draw the diagrams representing our models, you can install:

library(DiagrammeR)

Failure of linear models

What a simple linear model is supposed to do

We can draw a simple linear model like:

grViz("
digraph boxes_and_circles {
node [shape = square, color = DeepSkyBlue]
response; predictor

node [shape = diamond, color=ForestGreen]
intercept; effect

node [shape = circle, color = black]
'expected value';

'expected value' -> response [ label = '  + rnorm(0,\U03C3)', style=dashed, fontcolor=ForestGreen]; 
intercept -> 'expected value';
predictor -> effect  [arrowhead = none, label = ' \U00D7']; 
effect -> 'expected value' ;
}")
grViz("
digraph boxes_and_circles {
node [shape = square, color = DeepSkyBlue]
response; predictor

node [shape = diamond, color=ForestGreen]
'rnorm() noise'; effect

node [shape = circle, color = black]
'expected value';

'expected value' -> response; 

predictor -> effect  [arrowhead = none, label = ' \U00D7']; 
effect -> 'expected value' ;
'rnorm() noise' -> response
}")

Or if you prefer, we can write an equation:

\[ response = intercept + effect \times predictor + residual \text{, with } residuals \sim Normal(0, \sigma) \] First, let us look at an example when a linear model performs fine. We simulate data following all the assumptions of a linear model with equation

First, we choose parameter values for the equation:

number_of_observations <- 30
intercept <- 1
effect <- 0.5
sigma_residual <- 1

Then we draw random numbers for the predictor (for which we don’t have an equation, so we could do whatever) and for the response according to the equation:

set.seed(934)
# 
# predictor <- rnorm(n = number_of_observations, mean = 0, sd = 1)
# 
# response <- intercept + effect*predictor + rnorm(n = number_of_observations,mean = 0, sd = sigma_residual)

data_linear <- tibble(predictor = rnorm(n = number_of_observations, mean = 0, sd = 1),
                      response = intercept + effect*predictor + rnorm(n = number_of_observations,mean = 0, sd = sigma_residual))

Let us visualize these data

(p <- ggplot(data_linear, aes(x=predictor, y=response))+
  geom_point())

We fit a simple linear model and visualise the model prediction:

lm0 <- lm(response~predictor, data = data_linear)
summary(lm0)
## 
## Call:
## lm(formula = response ~ predictor, data = data_linear)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3702 -0.5418 -0.1358  0.5766  2.1599 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.7987     0.1586   5.037 2.51e-05 ***
## predictor     0.5990     0.1706   3.511  0.00153 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7867 on 28 degrees of freedom
## Multiple R-squared:  0.3057, Adjusted R-squared:  0.2809 
## F-statistic: 12.33 on 1 and 28 DF,  p-value: 0.001533
(p <- p + geom_smooth(method="lm"))
## `geom_smooth()` using formula 'y ~ x'

Next we draw the residuals:

p + geom_segment(aes(x=predictor, y=response, xend= predictor, yend=lm0$fitted.values))
## `geom_smooth()` using formula 'y ~ x'

Residuals look mostly unstructured, random and normally distributed. We can better assess these properties using diagnostic plots showing different properties of the residuals:

resid_panel(lm0)

Diagnostic plots may not look very good, but there is actually no serious violation of assumptions. That looks like an acceptable linear model.

When a linear model fails

Now let’s generate data with the same structure, except we convert the continuous values to binary values, following a logistic transformation and a Bernoulli distribution. For this type of data we generally want to model the probability of a 1 (survival, presence, success…). Data can be either 0 or 1. Although we cannot observe values between 0 and 1 (e.g., 0.5) in the data, we can interpret such a value: it is a probability (of observing a 1). On the other hand, values below 0 or above 1 are meaningless for a binary variable.

grViz("
digraph boxes_and_circles {
node [shape = square, color = DeepSkyBlue]
response; predictor

node [shape = diamond, color=ForestGreen]
intercept; effect

node [shape = circle, color = black]
'expected value';

'expected value' -> probability [ label = '  inverse logit']; 
probability -> response [ label = '  rbinom(size=1, p=probability)', style=dashed]; 
intercept -> 'expected value';
predictor -> effect  [arrowhead = none, label = ' \U00D7']; 
effect -> 'expected value' ;
}")

Or if you prefer, we can write an equation:

\[ response \sim \mathrm{Bernoulli} (\mathrm{logit}^{-1} (intercept + effect \times predictor) ) \]

set.seed(935)
number_observations <- 30
intercept <- 1
effect <- 2

predictor <- rnorm(n = number_observations,mean = 0,sd = 1)

latent <- intercept + effect*predictor
probability <- plogis(latent)
response <- sapply(probability, FUN=function(x){rbinom(1,1,x)})

data_binary <- tibble(predictor=predictor, response=response)

ggplot(data_binary, aes(x=predictor, y=response))+ geom_beeswarm(groupOnX = FALSE)

Let’s fit a linear regression to these data and visualise the result.

lm1 <- lm(response~predictor, data = data_binary)
summary(lm1)
## 
## Call:
## lm(formula = response ~ predictor, data = data_binary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.67756 -0.42221  0.04016  0.36265  0.62673 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.62268    0.07761   8.024 9.75e-09 ***
## predictor    0.28967    0.08316   3.483  0.00165 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4236 on 28 degrees of freedom
## Multiple R-squared:  0.3023, Adjusted R-squared:  0.2774 
## F-statistic: 12.13 on 1 and 28 DF,  p-value: 0.001647
ggplot(data_binary, aes(x=predictor, y=response))+
geom_smooth(method="lm", fullrange=TRUE) + geom_point(alpha=0.5) +
geom_segment(aes(x=predictor, y=response, xend=predictor, yend=lm1$fitted.values), alpha=0.4) +
  xlim(c(-3,1.2))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_segment).

Several problems are apparent:

  • The regression line (in blue) goes below 0 and above 1.
  • The uncertainty (confidence interval in grey) gets larger for extreme values of the predictor, whereas it seems like we are pretty sure there will be only 0s below \(x=-2\) and only 1s above \(x=1\), the uncertainty should be smaller on the sides!
  • Residuals are not independent of each other (they are more or less ranked from left to right) and their variation is not constant (from left to right the variation is small, then big, then small).

In addition, if we look at the diagnostic plots:

resid_panel(lm1)

  • We see bands in the first plot: residuals are not independent of each other.
  • The QQplot and the histogram show signs of non-normality (although that is not a big problem! Linear models are pretty robust to non-normality)

That is not a good linear model. Inference from such a model is unreliable. In particular you should not trust any measure of uncertainty (Standard Error, Confidence Interval, p-value), and you should certainly not trust extrapolation.

The solution: fit GLM for binary data, the binomial family

Instead of a linear model, here we need to use a Generalized Linear Model (GLM). GLMs come in different flavours or families, each adapted to different types of response variables. Here we will talk about GLMs for binary data, which belong to the binomial family.

Before discussing what a GLM is in details, let’s see that it is very easy to fit one in R. We can use the function glm(), which works exactly like lm(), except that you need to specify a family:

glm1 <- glm(response~predictor, data = data_binary, family = "binomial")
summary(glm1)
## 
## Call:
## glm(formula = response ~ predictor, family = "binomial", data = data_binary)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6327  -1.0134   0.3569   0.8682   1.5313  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   0.6969     0.4720   1.476   0.1398  
## predictor     1.7407     0.7132   2.440   0.0147 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 40.381  on 29  degrees of freedom
## Residual deviance: 29.754  on 28  degrees of freedom
## AIC: 33.754
## 
## Number of Fisher Scoring iterations: 5

The summary output looks very similar to that of lm(). From a quick look we see that the predictor has a positive effect on the response, as expected.

ANOVAs are difficult with GLMs, but we can still have a look to confirm that there is good evidence that the predictor has an effect on the response. We just need to specify test=Chisq.

anova(glm1, test = "Chisq")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: response
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
## NULL                         29     40.381            
## predictor  1   10.627        28     29.754 0.001114 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s see what the model prediction looks like:

(p <- ggplot(data_binary, aes(x=predictor, y=response))+ 
  geom_smooth(method="glm", method.args = list(family="binomial"), fullrange=TRUE) +
  geom_point())
## `geom_smooth()` using formula 'y ~ x'

That looks good!

What happens to the prediction when we extrapolate beyond the range of \(x\) values?

p + xlim(-10,10)
## `geom_smooth()` using formula 'y ~ x'

Two very good points:

  1. The prediction line never goes below 0 or above 1
  2. The uncertainty shrinks as we get to more extreme values of \(x\)

Two of the three problems we saw with lm() are fixed. What about the third one, the distribution of residuals?

Checking GLM assumptions

p + geom_segment(aes(x=predictor, y=response, xend=predictor, yend=fitted(glm1)))
## `geom_smooth()` using formula 'y ~ x'

resid_panel(glm1)

The residuals do not look much better with the glm() than with the lm(). But that is okay! GLMs do not have the same assumptions as LMs. In fact, for binary binomial GLMs there are very few assumptions. The only one you need to remember for now is that observations are assumed to be independent of each other conditional on predictors, or in different words, your model should not leave important blocking factors unaccounted for.

When working with GLMs, the function resid_panel() is useless.

What to do then? The basic idea is to simulate fake data from the model we fitted and compare them to real data. If the model is a good fit to the data, then the simulated data should look like the real data.

For instance, let’s visualise the mean and standard deviation of simulated versus real data

real_properties <- data_binary %>% 
  summarise(mean=mean(response), sd=sd(response)) %>%
  pivot_longer(cols = c(mean, sd), names_to = "property")

simulated_properties_glm <- map_dfr(.x = 1:1000, ~{simulate(glm1)}, .id = "simulation" ) %>%
  group_by(simulation) %>%
  summarise(mean=mean(sim_1), sd=sd(sim_1)) %>%
  pivot_longer(cols = c(mean, sd), names_to = "property")


  
simulated_properties_glm %>% ggplot(aes(y=value, x=property)) +
  geom_violin() + 
  geom_point(data=real_properties, col="red")

It looks like our model generates the most likely values of both mean and variance. That is good, but that is only 2 properties out of 30 data points, and we needed quite a lot of complex code to get there!

In addition, the linear model seems to perfect very well too:

simulated_properties_linearmodel <- map_dfr(.x = 1:1000, ~{simulate(lm1)}, .id = "simulation" ) %>%
  group_by(simulation) %>%
  summarise(mean=mean(sim_1), sd=sd(sim_1)) %>%
  pivot_longer(cols = c(mean, sd), names_to = "property")

simulated_properties_linearmodel %>% ggplot(aes(y=value, x=property)) +
  geom_violin() + 
  geom_point(data=real_properties, col="red")

Fortunately, there is package that automates and make the assessment mathematically more correct and powerful, by focusing on various aspects of the distribution of real and simulated residuals (transformed with some clever math), so that we can assess all aspects of the model fit (not just the mean and variance).

library(DHARMa)

We will generally use this package in two ways:

  1. Test whether the data generated have the same distribution as the real data (first plot), and whether the transformed residuals appear randomly distributed across the data generated by the model (second plot):
plot( simulateResiduals(glm1) )

  1. Test whether the model generates the right amount of variation (here we compare the standard deviation of transformed residuals, between real and simulated data):
testDispersion(glm1)

## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.135, p-value = 0.448
## alternative hypothesis: two.sided

Let’s see if the linear model performs well (NB: we already know lm1 is not great because of the output of resid_panel()):

plot( simulateResiduals(lm1) ) # pretty bad distribution of residuals
## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : Fitting terminated with step failure - check results carefully
## qu = 0.5, log(sigma) = -2.380996 : outer Newton did not converge fully.
## qu = 0.5, log(sigma) = -2.303154 : outer Newton did not converge fully.

testDispersion(lm1) #no problem here

## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97814, p-value = 0.944
## alternative hypothesis: two.sided

Here the linear model is clearly not a good fit. A note of warning though: do not trust those tests and plots blindly! If you have good conceptual reasons to think a linear model is wrong for a dataset, don’t let non-significant p-values change your mind.

We will practice more how to use DHARMa in future sessions. For now know that it is a good option to test GLMs assumptions.

Start second session here!

Reminder: we simulated those data, using a generative model corresponding to a binomial GLM:

set.seed(935)
number_observations <- 30
intercept <- 1
effect <- 2

predictor <- rnorm(n = number_observations,mean = 0,sd = 1)

latent <- intercept + effect*predictor
probability <- plogis(latent)
response <- sapply(probability, FUN=function(x){rbinom(1,1,x)})

data_binary <- tibble(predictor=predictor, response=response)
ggplot(data_binary, aes(x=predictor, y=response))+ geom_beeswarm(groupOnX = FALSE)

A linear model (in red) shows many pathologies, whereas a binomial GLM does what we want:

ggplot(data_binary, aes(x=predictor, y=response))+ geom_beeswarm(groupOnX = FALSE) +
  geom_smooth(method="lm", col="red", fill="red", fullrange=TRUE) + 
  geom_smooth(method="glm", col="blue", fill= "blue", method.args = list(family="binomial"), fullrange=TRUE)  +
  xlim(c(-4, 4))
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

What about data generation?

So, a GLM predicts probabilities, values between 0 and 1. But we observe only 0s and 1s. Does the GLM knows about that? Yes it does! The GLM relates probabilities to data in a very obvious way: for a predicted probability \(p\), the GLM thinks you should observe a proportion \(p\) of 1s and a proportion \(1-p\) of 0s. If it seems trivial, it is because it is. GLMs for binary data are really easy.

Other GLMs use more complex processes, so let us look more formally at how a binary GLM sees the world.

The distribution turning a probability \(p\) into 0s and 1s is the bernoulli distribution, a special case of the binomial distribution. We can draw random samples from the bernoulli distribution with rbinom(n=, size=1, prob=)

(bernoulli_sample <- rbinom(n = 1000, size = 1, prob = 0.3))
##    [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1
##   [38] 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 0
##   [75] 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0
##  [112] 1 0 1 1 1 1 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 1
##  [149] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 1 0 0 1
##  [186] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
##  [223] 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0
##  [260] 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0
##  [297] 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1
##  [334] 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 0
##  [371] 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1
##  [408] 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1
##  [445] 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 1 1 0 1 0 1 1 0 0
##  [482] 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 1
##  [519] 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 1 1 1 1 0 0 0 1 0 0 1 0 0 0
##  [556] 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0
##  [593] 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0
##  [630] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0
##  [667] 0 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 1
##  [704] 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1
##  [741] 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
##  [778] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0
##  [815] 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0
##  [852] 1 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0
##  [889] 1 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0
##  [926] 1 1 1 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0 1 0 0 1 0 1 0 0
##  [963] 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 0 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0
## [1000] 0
mean(bernoulli_sample)
## [1] 0.297

All GLMs use distributions that have some kind of relationship between their mean and their variance. For the bernoulli, the relationship is \(variance = mean*(1-mean)\), and the mean is expected to be equal to the probability.

var(bernoulli_sample)
## [1] 0.209
mean(bernoulli_sample)*(1-mean(bernoulli_sample))
## [1] 0.208791

That means that binary data are most variable for a probability of 0.5, and least variable for probabilities close to 0 or close to 1.

nb_rows <- 1000
variability_bernoulli <- tibble(lm_value= seq(-5,5, length.out = nb_rows),
       probability = plogis(lm_value),
       observation = rbinom(n = nb_rows, size = 1, prob = probability))

ggplot(variability_bernoulli, aes(x=lm_value, y=probability, col=probability)) +
  geom_line()+
  geom_point(inherit.aes = FALSE, aes(x=lm_value, y=observation))

Let’s simulate survival for each sex based on the predicted survival probabilities:

pred_survival <- summary(emmeans(vole_s_glm, ~sex, type = "response"))

simulated_survival <- pred_survival %>%  
  group_by(sex) %>%
  select(prob) %>%
  summarise(survival = rbinom(n = 100, size = 1, prob = prob), prob=prob)
## Adding missing grouping variables: `sex`
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
simulated_survival %>% group_by(sex) %>%
  summarise(simulated_mean = mean(survival), expected_mean=mean(prob),
            simulated_var = var(survival), expected_var=mean(prob)*(1-mean(prob)))
## # A tibble: 2 × 5
##   sex    simulated_mean expected_mean simulated_var expected_var
##   <fct>           <dbl>         <dbl>         <dbl>        <dbl>
## 1 Female           0.31         0.278         0.216        0.201
## 2 Male             0.14         0.160         0.122        0.134

What a GLM is

From previous examples we saw what a GLM is:

  1. A linear model (response = intercept + slope × predictor . . . ), what you see with summary(glm1)
  2. A “Link function” = a map between the linear function (−∞ to +∞) and a probability distribution (from 0 to 1 for bernoulli)
  3. A probability distribution (bernoulli, Binomial, Poisson. . . ) assumed to generate the data (either 0 or 1 for bernoulli)

glm scales

\[ y_i = \alpha + \beta x_i \\ p_i = \mathrm{logit}^{-1}(y_i) \\ \mathrm{obs}_i \sim Bernoulli(p_i) \]

More practice with snow voles

Let’s consider the relationship between mass and survival.

ggplot(survdat_early, aes(x = mass, y=survival)) +
  geom_point(alpha=0.2)
## Warning: Removed 13 rows containing missing values (geom_point).

Difficult to see much on this plot. Better to add some jitter/beeswarm and a glm fit:

ggplot(survdat_early, aes(x = mass, y=survival)) +
  geom_beeswarm(groupOnX = FALSE) +
  geom_smooth(method = "glm", method.args=list(family="binomial"))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning: Removed 13 rows containing missing values (position_beeswarm).

vole_glm <- glm(survival ~ mass, data = survdat_early)
summary(vole_glm)
## 
## Call:
## glm(formula = survival ~ mass, data = survdat_early)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.2954  -0.2466  -0.2015  -0.1490   0.8736  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.336739   0.045189   7.452 2.41e-13 ***
## mass        -0.003755   0.001403  -2.677  0.00759 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1718025)
## 
##     Null deviance: 137.64  on 795  degrees of freedom
## Residual deviance: 136.41  on 794  degrees of freedom
##   (13 observations deleted due to missingness)
## AIC: 860.87
## 
## Number of Fisher Scoring iterations: 2

So it looks like higher body mass corresponds to lower survival across the population.